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Abstract. To protect sensitive information in a cross tabulated table, it is a common practice 
to suppress some of the cells in the table. This paper investigates four levels of data security of a 
two-dimensional table concerning the effectiveness of this practice. These four levels of data security 
protect the information contained in, respectively, individual cells, individual rows and columns, 
several rows or columns as a whole, and a table as a whole. The paper presents efficient algorithms 
and NP-completeness results for testing and achieving these four levels of data security. All these 
complexity results are obtained by means of fundamental equivalences between the four levels of data 
security of a table and four types of connectivity of a graph constructed from that table. 
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1. Introduction. Cross tabulated tables are used in a wide variety of documents 
to organize and exhibit information. The values of sensitive cells in such tables are 
routinely suppressed to conceal sensitive information. There are two fundamental 
issues concerning the effectiveness of this practice [[j} ||, [|, |[ fj], g, [| |2(], |22|, |23} |. 
One is whether an adversary can deduce significant information about the suppressed 
cells from the published data of a table. The other is how a table maker can suppress 
a small number of cells in addition to the sensitive ones so that the resulting table 
does not leak significant information. 

This paper investigates how to protect the information in a two-dimensional table 
that publishes three types of data (see [^6) for examples): (1) the values of all cells 
except a set of sensitive ones, which are suppressed, (2) an upper bound and a lower 
bound for each cell, and (3) all row sums and column sums of the complete set of 
cells. The cells may have real or integer values. They may have different bounds, and 
the bounds may be finite or infinite. The upper bound of a cell should be strictly 
greater than its lower bound; otherwise, the value of that cell is immediately known 
even if that cell is suppressed. The cells that are not suppressed also have upper and 
lower bounds. These bounds are necessary because some of the unsuppressed cells 
may later be suppressed to protect the information in the sensitive cells. 

The focus of this paper is on how to protect the type of information defined here. 
A bounded feasible assignment to a table is an assignment of values to the suppressed 
cells such that each row or column adds up to its published sum and the bounds of 
the suppressed cells are all satisfied. A linear combination of the suppressed cells 
is a linear invariant if it has the same value at all bounded feasible assignments 
(see [ |T6| for examples). Intuitively, the information contained in a linear invariant 
is unprotected because its value can be uniquely deduced from the published data. 
Five classes of linear invariants are of special significance. A positive invariant is one 
whose coefficients are all nonnegative with at least one coefficient being positive. A 
unitary invariant is one whose coefficients arc +1, 0, or —1. A sum invariant is one 
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whose coefficients are +1 or 0. A rectangular sum invariant is one that sums over all 
suppressed cells shared by a set of rows and a set of columns. An invariant cell is a 
suppressed cell that forms a linear invariant all by itself. 

Four levels of data security of a table are discussed in this paper. To motivate 
the discussion, suppose that a given table tabulates the quantities of several products 
made by different factories. A row represents a factory, a column records the quantities 
of a product, and a cell contains the quantity of a product made by a factory. Level 
1 protects the suppressed cells individually. A factory wishes to conceal the quantity 
of a particular product. Naturally, that quantity should be suppressed and its precise 
value should not be uniquely determined from the published data of the table. Thus, 



a suppressed cell is protected if it is not an invariant cell 12 . Level 2 protects a 
row (or column) as a whole. After all suppressed cells are protected, an adversary 
may still be able to obtain useful information by combining the suppressed cells. If a 
factory wishes to protect the information about the quantities of all its products as 
a whole, it must ensure that no information of a sensitive type can be extracted by 
combining the suppressed cells in the row representing that factory. Hence, a row is 
protected if there is no linear invariant of a desired type that combines the suppressed 
cells in that row. Level 3 protects a set of k rows (or k columns) as a whole. Suppose 
that a company owns k factories. It wishes to conceal aggregate information about 
all its factories, not just the information about each individual factory. It should 
require that no information of an important type may be derived by combining the 
suppressed cells in the k rows for its k factories. Thus a set of k rows is protected 
if there is no linear invariant of a desired type that combines the suppressed cells in 
those rows. Level 4 protects the given table as a whole. Further suppose that the 
above company owns all the factories tabulated in the table. It wishes to protect 
aggregate information about of all its factories and all their products. It stipulates 
that only trivial information may be found among a desired class of combinations of 
the suppressed cells. Thus, a table is protected if it has no linear invariant of a desired 
type that combines its suppressed cells. 

The key contribution of this paper is to establish that the latter three levels of 
data security of a table are equivalent to three types of connectivity of a graph called 
the suppressed graph of that table. Previously, Gusfield showed that the first level of 
data security is equivalent to a certain type of connectivity of the suppressed graph 
Ipg] . The paper further uses these fundamental equivalences to obtain three sets of 
complexity results. Firstly, the second and the fourth level of data security of a table 
can be tested in optimal linear time and that the third level can be tested in polynomial 
time. Previously, Gusfield showed how to find all invariant cells of a table and test 
for its first level of data security in optimal linear time |Q . Secondly, for each of the 
four levels of data security it is an NP-complete problem to compute and suppress the 
minimum number of additional cells in a table in order to achieve the desired level 
of data security. Thirdly, for a large and practical class of tables, the above optimal 
suppression problem for the second and the fourth level of data security can be solved 
in optimal linear time. For the first level of data security, Gusfield showed that the 
optimal suppression problem can be solved in optimal linear time . For the third 
level of data security the optimal suppression problem remains open. 

We review basics of graphs and tables in §||, discuss the four levels of data security 
in §|3| through §[5], and compare them in §|^. 

2. Preliminaries. Every graph in this paper is a mixed graph, i.e., it may con- 
tain both undirected and directed edges with at most one edge between two ver- 
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tices. Let T be a table. The suppressed graph TL — (A, B, E) and the total graph 



TL' = (A, B, E') of T are the bipartite graphs constructed here (see Figure 1.1 for an 
example). For each row (respectively, column) of T, there is a vertex in A (respec- 
tively, B); this vertex is called a row (respectively, column) vertex. For each cell x 
at row i and column j, there is an edge e G E' between the vertices of row i and 
column j. If the value of x is strictly between its lower and upper bounds, then e 
is undirected. Otherwise, if the value equals the lower (respectively, upper) bound, 
then e points from the row endpoint to the column cndpoint (respectively, from the 
column endpoint to the row endpoint). E consists of the edges corresponding to the 
suppressed cells. Note that TL is a subgraph of TL' and TL' is complete (i.e., for all 
u G A and v G B, E' has exactly one edge between u and v). Also, given an arbitrary 
complete bipartite graph and a subgraph on the same vertices, it takes only linear 
time to construct a table with these two graphs as its total and suppressed graphs. 

A traversable cycle or path is one that can be traversed along its edge directions. 
A direction-blind cycle or path is one that can be traversed if its edge directions are 
disregarded; we often omit the word direction-blind for brevity. A graph is connected 
if each pair of vertices are in a path. A connected component is a maximal connected 
subgraph. A nonsingleton connected component is one with two or more vertices. A 
graph is strongly connected if each pair of vertices are in a traversable cycle. A strong 
component is a maximal strongly connected subgraph. 

The effective area of a linear invariant F of T, denoted by EA(F), is the set of 
suppressed cells in the nonzero terms of F. EA(F) is also regarded as a set of edges 
in TL. F is nonzero if EA(F) ^ 0. F is minimal if it is nonzero and T has no nonzero 
linear invariant whose effective area is a proper subset of EA(F). Note that given a 
minimal linear invariant F, if F' is a nonzero linear invariant with EA(F') C EA{F), 
then F' is also minimal and is a multiple of F. Thus, a minimal linear invariant is 
unique up to a multiplicative factor with respect to its effective area. 

An edge set of a graph is an edge cut if its removal disconnects a connected 
component. An edge cut is minimal if no proper subset of it is an edge cut. 

Fact 1. Let Z be an edge set of a strong component TL' ofTi. Z is a minimal 
edge cut of TL' if and only if TL' — Z has exactly two connected components, say, Ti\ 
andTLi, md each edge of Z is between TL\ andTLi- 

Assume that Z is a minimal edge cut of TL' . Z is bipartite if the endpoints of Z 
in TLi are all row vertices or all column vertices. An edge set of TL is a (respectively, 
bipartite) basic set if it consists of an edge not in any strong component of TL or is a 
(respectively, bipartite) minimal edge cut of some strong component. 

Theorem 2.1 (fit)). 

1. A linear invariant of T is minimal if and only if its effective area is a basic 
set ofTL. Also, for each basic set Z ofTL, there is a minimal linear invariant 
F ofT with EA(F) = Z. 

2. Every minimal linear invariant is a multiple of a unitary invariant. Further- 
more, a minimal linear invariant F of T is a multiple of a sum invariant if 
and only if EA(F) is a bipartite basic set ofTL. 

3. For each nonzero linear invariant F of T , there exist unitary minimal lin- 
ear invariants F\, . . . , Fk of T such that F — J^i=i c i'Fi f or some Ci > 0, 
EA(F) — \Jl =l EA(Fi), and for each Fi and each e € EA(FA, the coefficients 
of e in F and Fi are either both positive or both negative. 



Remark. A referee has indicated that a different proof for Theorem 2.1 from that 
[Till can be constructed by means of conformal vector decomposition |18|, 
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3. Protection of a cell. A suppressed cell of T is protected if it is not an 
invariant cell. 

3.1. Cell protection and bridge- freeness. A graph is bridge-free if it has no 
edge cut consisting of a single edge. 

Theorem 3.1 (@). 

1. A suppressed cell ofT is protected if and only if it is an edge in an edge-simple 
traversable cycle ofTL. 

2. The suppressed cells ofT are all protected if and only if each connected com- 
ponent of TL is strongly connected and bridge-free. 

Corollary 3.2 ( fl2]| ). Given TL, the unprotected cells ofT can be found in 
0(\H\) time. 

3.2. Optimal suppression problems for cell protection. The problem be- 
low is concerned with suppressing the minimum number of additional cells in T such 
that the original and the new suppressed cells in the resulting table are all protected. 

Problem 1 (Protection of All Cells). 

• Input: T and an integer p > 0. 

• Output: Is there a set P consisting of at most p published cells of T such 
that all suppressed cells are protected in the table formed by T with the cells 
in P also suppressed? 

Problem |l| can be reformulated as the graph augmentation problem below. 
Problem 2. 

• Input: A complete bipartite graph TL' , a subgraph TL, and an integer p > 0. 

• Output: Is there a set P of at most p edges in TL' — TL such that each connected 
component of TL U P is strongly connected and bridge-free? 

Lemma 3.3. ProWems [I] and[^ can be reduced to each other in linear time. 



Proof. The proof follows from Theorem 3.1(2). □ 
The next problem is NP-complete [jl^| . It is used here to prove that Problems |l| 
and H are hard. 

Problem 3 (Hitting Set). 

• Input: A finite set S, a nonempty set W C 2 s , and an integer h > 0. 

• Output: Is there a subset S' of S such that \S'\ < h and S' contains at least 
one element in each set in Wl 

Theorem 3.4. Problems 1 and^ are NP-complete. 

Proof. Problems [j] and 2 are both in NP. To prove their completeness, by 
Lemma ^3 , it suffices to reduce Problem || to Problem |[ 

Given an instance S — {si, . . . , s a }, W = {Si, . . . , Sp}, h of Problem |3j an 
instance TL' — (A, B, E'),TL — (A, B, E),p of Problem | is constructed as follows: 

• Rule 1: Let A = {ao, a\, . . . , a a }. The vertices a\,...,a a correspond to 
si, . . . , s a , but ao corresponds to no Sj. 

• Rule 2: Let B = {bo, b\, . . . , bp}. The vertices b±,...,bp correspond to 
Si, . . . , S/3 of S, but bo corresponds to no Sj. 

• Rule 3: Let E' consist of the following edges: 

1. The edge between ao and bo is bo — ► ao- 

2. For all j with 1 < j < (3, the edge between ao and bj is ao — * bj. 

3. For all i with 1 < i < a, the edge between a^ and bo is flj — > bo. 

4. For each Sj and each Sj, if Sj G Sj, then the edge between ai and bj is 
bj — > a,-; otherwise it is a; — > bj. 

• Rule 4: Let E = {b — > a } U {a — > b\, ■ ■ ■ , a — > bp}. 
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• Rule 5: Let p = h + (3. 

The above construction can be easily computed in polynomial time. The next 
two claims show that it is indeed a desired reduction from Problem || to Problem |2[ 

Claim 1. If some S' C S with \S'\ < h has at least one element in each Sj, then 
some P C E' — E consists of at most p edges such that every connected component of 
7i U P is strongly connected and bridge-free. 

To prove this claim, observe that for each Sj, some Si j 6 S'fl Sj exists. By Rule 
3(4), Pi = {bi — ► a^, .. . , bp — > a ifj } exists. By Rule 3(3), P 2 = {a ix — > b Q , . . . , a i/3 — > 
bo} exists. Let P = Pi U Pi- Note that Pi consists of (3 edges. P2 consists of at most 
\S'\ edges. Thus P has at most p = f3 + h edges. For all j with 1 < j < /3, the edges 
fro —> 0-0,0-0 ~^ bjibj ~^ Oi j ,ai j — > bo form a vertex-simple traversable cycle. Because 
E U P consists of the edges in these cycles, every connected component of H U P is 
strongly connected and bridge-free. This finishes the proof of Claim [j]. 

Claim 2. // some PC E' — E consists of at most p edges such that every 
connected component of HUP is strongly connected and bridge- free, then some S'CS 
with \S'\ < h has at least one element in each Sj. 

To prove this claim, observe that for all j with 1 < j < f3, by Rule 4, E contains 
a o ~ * °j but no edge pointing from bj. Because every connected component of 7i UP is 
strongly connected, P contains an edge bj — > for some ij. By Rule 3(4), s ij S Sj. 
Let S' — {si 1 , . . . , Sip }. Note that P contains bi — * Oj x , . . . , bp — * a*, but E contains 
no edges pointing from {a^ , . . . , }. Because every connected component of H UP is 
strongly connected, P must also contain at least one edge pointing from each vertex 
in {a^, . . . ,a i0 }. Thus P contains at least + (3 edges. Then \S'\ < h because 
|P| < (3 + h. This finishes the proof of Claim |^ and thus that of Theorem 3.4. □ 



The next two problems are optimization versions of Problems [l] and || for undi- 
rected graphs and tables whose total graphs are undirected. 
Problem 4 (Protection of All Cells). 

• Input: The suppressed graph of a table T whose total graph is undirected. 

• Output: A set P consisting of the smallest number of published cells of T 
such that all suppressed cells are protected in the table formed by T with the 
cells in P also suppressed. 

Problem 5. 

• Input: A bipartite undirected graph H = (A, B,E). 

• Output: A set P consisting of the smallest number of undirected edges 
between A and B but not in E such that every connected component of 
(A, B,EUP) is bridge-free. 

Note that Problem || needs not specify Tt' because it is undirected and thus is 
unique for TL. Similarly, TL U P is always strongly connected. 

Lemma 3.5. Problems^ and^ can be reduced to each other in linear time. 



Proof. The proof is similar to that of Lemma 3.3. □ 



Theorem 3.6 ([p"l|) ■ Pro6/em|^ is solvable in linear time; thus so is Problem^. 

4. Protection of rows and columns. This section discusses the data security 
of a table at Levels 2 and 3 in a unified framework. Let EA(R) denote the set of 
suppressed cells in a row or column R. Let R = ^2 e( zEA(R) e - • • • ' ^ k ^ e k 

rows or k columns of T, but no mixed case. For Level 3 data security, {i?i, . . . , Rk} 
is protected with respect to the linear invariants (respectively, the positive invariants, 
the unitary invariants, the sum invariants, or the rectangular sum invariants) if the 
conditions below hold: 
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1. Each linear invariant (respectively, positive invariant, unitary invariant, sum 
invariant, or rectangular sum invariant) F of T with EA(F) C U^ =1 EA(Ri) 
is a linear combination of R\ , . . . , Rk ■ 

2. No suppressed cell of R%, . . . , Rk is an invariant cell. 

Level 2 data security is a special case of Level 3 with k = 1 and its definitions can 
be simplified. A row or column R is protected with respect to the linear invariants 
(respectively, the positive invariants, the unitary invariants, or the sum invariants) if 
the conditions below hold: 

1. Each linear invariant (respectively, positive invariant, unitary invariant, or 
sum invariant) F with EA{F) C EA(R) is a multiple of R. 

2. No suppressed cell in R is an invariant cell. 

We do not explicitly consider the protection of R with respect to the rectangular sum 
invariants because for k = 1 these invariants are the same as the sum invariants. Also, 
the five types of invariants here are implicitly considered for cell protection because 
a linear invariant with exactly one nonzero term is essentially an invariant cell. 

The two conditions in the definitions are based on technical considerations. No 
matter how many cells in T are suppressed, R\ , . . . , Rk and their linear combinations 
are always linear invariants. Thus the first condition gives the best possible protection 
for Ri, . . . , Rk as a whole. If Ri has either no suppressed cell or at least two, the first 
condition implies the second one; otherwise, the first condition holds trivially but the 
only suppressed cell in Ri is an invariant. The second condition is adopted to avoid 
this undesirable situation. 

These definitions also require that Ri,...,Rk be all rows or all columns. In 
these two pure cases, EA(Ri), . . . , EA(Rk) are pairwise disjoint. Therefore, a linear 
combination of R\ , . . . , Rk has a very simple structure and encodes essentially the 
same information as do Ri, . . . ,Rk- In contrast, if at least one Ri is a row and at 
least one Rj is a column, then a linear combination of Ri , . . . , Rk may have a very 
complex structure and may encode very different information from that contained in 

. . . , Rk- Furthermore, unlike in the two pure cases, these definitions do not seem 
to have useful characterizations in the mixed case. 

The importance of the first four types of invariants considered in the definitions 
are evident. The fifth type, a rectangular sum invariant, is motivated by a popular 
technique for protecting information in a table. Let e be an invariant cell at row i 
and column j. To protect e, row i can be split into several rows, and column j into 
several columns. Correspondingly, e is split into four or more cells. Then enough 
of these refined cells can be suppressed to ensure that each suppressed refined cell is 
protected. However, the sum of the suppressed refined cells of e is a rectangular sum 
invariant. This property can be used to uniquely determine the value of e. Thus the 
consideration of rectangular sum invariants renders this refinement approach useless 
at the third level of data security. 

4.1. Equivalence of k row-column protection. This section shows that the 
five definitions of k row-column protection are all equivalent. 
Lemma 4.1. Every sum minimal invariant is rectangular. 

Proof. Let F be a sum minimal invariant of T. If EA(F) consists of an edge 
not in an y s trong component of 7i, then F is trivially rectangular. Otherwise, by 
Theore m p4| EA(F) is a bipartite minimal cut set of a strong component TC of Ti. 
By Fact |l| H' — EA(F) has two connected components Hi and Ti' 2 - Let Ui and Ui 
be the sets of endpoints of EA(F) in 7i[ and Tt' 2 , respectively. By the bipartiteness 
of EA(F), without loss of generality the vertices in Ui are rows in T and those in U2 
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are columns. Then F is rectangular because EA(F) consists of the edges between U\ 
and U2 in H. □ 

Lemma 4.2. If no EA(Ri) is empty, the statements below are equivalent: 

1. Every positive invariant F with EA(F) C U^ =1 EA(Ri) is a linear combina- 
tion of R\ , . . . , Rk- 

2. Every sum invariant F with EA(F) C {J* =1 EA(Ri) is a linear combination 
of . . . , Rk ■ 

3. Every rectangular sum invariant F with EA(F) C U% =1 EA(Ri) is a linear 
combination of R\, . . . , Rk. 

4- R±,...,Rk are the only sum minimal invariants of T whose effective areas 
are subsets of L)^ =1 EA(Ri). 
Proof. The directions 1 2 3 are straightforward. The direction 4 => 1 follows 
from the fact that by Statement 4 Ri, . . . , Rk are the only factors in the decomposition 
in Theorem 2.1(3) for a positive invariant F with EA(F) C uf =1 EA(Ri). To prove 
3 ^> 4, note that because Rj is a positive invariant for all Rj, by Theorem 2.1(3) there 
is a sum minimal invariant F with EA(F) C EA(Rj). Since F is also rectangular, 
by Statement 3, F = c i'Ri f° r some Cj. Because R±, . . . ,Rk share no variable, 

by the minimality of F and coefficient comparison Rj equals F and thus is a sum 
minimal invariant. To prove the desired uniqueness of R\, . . . ,Rk, let F' be a sum 



minimal invariant with EA(F') C uf- 1 EA(Ri). By Lemma 4.1, F' is rectangular. 



By Statement 3, F' = J^i=i ^i'^i f° r some c-. Because F' is nonzero, some c' h 7^ 0. 
Because R±, . . . ,Rk do not share variables, EA(Rh) C EA(F'). Then, F' — Rh by 
coefficient comparison and the minimality of F' . □ 

Lemma 4.3. If no EA(Ri) is empty, the statements below are equivalent: 

1. Every linear invariant ofT whose effective area is a subset ofuf =1 EA(Ri) is 
a linear combination of Ri, . . . , Rk ■ 

2. Every unitary invariant whose effective area is a subset of Llf =1 EA(Ri) is a 
linear combination of R\ , . . . , Rk ■ 

3. . . . , Rk and their nonzero multiples are the only minimal linear invariants 
ofT whose effective areas are su bsets of\J k i=l EA{Ri). 

Proof. The proof is similar to that of Lemma |4.2| . □ 

Lemma 4.4. {R\, ■ ■ ■ ,Rk} is protected with respect to the positive invariants 
(respectively, the linear invariants) if and only the following statements hold: 

1. For each strong component D of H. and each vertex Ri contained in D, the 
component D contains all edges incident to Ri inH. 

2. The nonempty sets among EA(Ri), . . . , EA(Rk) are the only bipartite mini- 
mal edge cuts (respectively, the only minimal edge cuts) of the strong compo- 
nents ofH among the subsets of \J^_ 1 EA(Ri). 

3. Each vertex Ri is either isolated or incident to two or more edges in Ti. 
Proof. The proof of the lemma for the positive invariants and that for the general 

invariants are similar; only the former is detailed here. For the direction Statement 
3 follows from the second condition of the definit ion of {R%, . . . , Rk} being pro tect ed. 
The n, Statements 1 and 2 follows from Lemma|J(l), fh^(4), and Theorem [2.l|(l) , 
2.1(2). For the direction <==, by Statements 1 and 2, Theorem 2A and Lemma 4.2 , 
the first condition of {R%, ■ ■ ■ , Rk} being protected is satisfied. The second condition 
then follows from Statement 3. □ 

A set of vertices in a connected graph is a vertex cut if its removal disconnects 
the graph. 

Fact 2. // each EA(Ri) is included in the strong component ofTi that contains 
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R i , then the following statements are equivalent: 

1. Among the subsets of Lif =1 EA(Ri), the nonempty sets EA(Ri) are the only 
minimal edge cuts of the strong components ofTL. 

2. Among the subsets of U^ =1 EA(Ri), the nonempty sets EA(Ri) are the only 
bipartite minimal edge cuts of the strong components ofTL. 

3. {R\, . . . , Rk} includes no vertex cut of any strong component ofH. 
Theorem 4.5. The five definitions of a set of k rows or k columns being protected 

are all equivalent. 

Proof. If some EA(Ri) = 0, then . . . , Rk} is protected if and only if 
{Ri, ■ ■ ■ ,Rk} — {Ri} is protected. Thus without loss of generality assume that no 
EA(Ri) is empty. Then, by Lemma |4~2] the protection definitions with respect to the 
positive, sum, and rectangular invariants are all equivalent. Similarly, by Lemma 4.3, 
those with respect to the general and unitary invariants are also equivalent. This 
theorem then follows directly from Lemma 4.4 and Fact |^. □ 

4.2. k Row-column protection and bipartite-(fc + l)-connectivity. A con- 
nected bipartite graph Q = (X,Y,I) is bipartite-(k + l)-connected if \X\ > k + I, 
|y| > k + 1, and neither X nor Y includes a vertex cut of at most k vertices. Q is 
(k + l)-connected if \X U Y\ > k + 1 and there is no vertex cut of at most k vertices. 

Lemma 4.6. {Ri, . ■ . , Rk} is protected if and only if the statements below hold: 

1. For each strong component D of H and each vertex Ri £ D, D contains all 
the edges incident to Ri inTL. 

2. {i?i, . . . , Rk} includes no vertex cut of any strong component ofH. 

3. Each Ri is either isolated or incident to two or more edges in H. 
Proof. This lemma follows from Theorem 4.5, Lemma 4.4 and Fact ^. □ 
Theorem 4.7. Every set of at most k rows or k columns of T is protected if 

and only if every nonsingleton connected component of H is strongly connected and 
bipartite-(k + l)-connected. 



Proof. This theorem follows directly from Lemma 4.6. □ 
Corollary 4.8. 

1. Given Ti. and {R\, ■ ■ ■ , Rk}, whether {i?i, . . . , Rk} is protected can be deter- 
mined in 0{\TL\) time. 

2. Give Ti and k, whether T has any unprotected set of at most k rows or k 
columns can be answered in 0(k 4 n 2 ) time, where n is the number vertices in 
Ti. 

Proof. Statement 1 follows from Lemma 4.6 in a straightforward manner using 
linear-time algorithms for connectivity and strong connectivity ||. Statement 2 fol- 
lows from Theorem 4.7. The key step is to test the bipartite-(/c + I)-connectivity of 
Ti within the stated time bound. We first construct two auxiliary graphs Ha and 
Hb- For each vertex u S A, replace u with k + 1 copies in Ha- For each u E A 
and each edge e in Ti between u and a vertex v G B, replace e with k + 1 copies 
between v and the k + 1 copies of u in Ha- Tis is obtained by exchanging A and 
B in the construction. Because Ti is connected and each vertex in A is duplicated 
k + I times, Ha has a vertex cut U of at most k vertices if and only if U is a subset 
of B and is a vertex cut of Ti. A symmetrical statement for B also holds. Thus Ti 
is bipartite-(fc + I)-connected if and only if both Ha and Hb are (k + l)-connected. 
This corollary then follows from the fact |^|, [l7| that the (k + l)-connectivity of an 
m-vertex graph can tested in 0(k 2 m 2 ) time if k < \fm. □ 

Corollary 4.9. Given H, it takes 0(\H\) time to find the unprotected rows and 
columns ofT and decide whether all individual rows and columns ofT are protected. 
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Proof. This corollary follows from Lemma 4.6 in a straightforward maimer using 



linear-time algorithms for strong connectivity and 2-connectivity |3j . □ 

4.3. Optimal suppression problems for k row-column protection. 

Problem 6 (Protection of All Sets). 

• Input: T and two integers k > and p > 0. 

• Output: Is there a set P consisting of at most p published cells of T such 
that every set of at most k rows or k columns is protected in the table formed 
by T with the cells in P also suppressed? 

Problem ^ can be reformulated as the following graph augmentation problem. 
Problem 7. 

• Input: A complete bipartite graph Ti', a subgraph TL, and integers k > and 

P > o. 

• Output: Is there a set P of at most p edges in Ti' — Ti such that each nons- 
ingleton connected component of Ti U P is strongly connected and bipartite- 
(k + l)-connected? 

LEMMA 4.10. Problems and Q can be reduced to each other in linear time. 



Proof. The proof follows from Theorem 4.7. □ 

Theorem 4.11. For k = 1, Problems^ and are NP-complete. Thus, both 
problems are NP-complete for general k. 

Proof. Problems || and ^| are both in NP. To prove their completeness for k = 1, 



by Lemma 4.10, it suffices to reduce Problem H to Problem \n with k = 1. Given an in- 



stance S = {s 1 ,...,s a }, W = {Si,. ..,Sp}, h of Problem BL let W = (A,B,E'),H 



[A,B,E),p be the instance constructed for Theorem 3.4. The next two claims show 
that this transformation is indeed a desired reduction. 

Claim 3. If some S' C S with \S'\ < h has at least one element in each Sj, then 
some P C E' — E consists of at most p edges such that every nonsingleton connected 
component ofTiUP is strongly connected and bipartite- 2- connected. 

To prove this claim, observe that for each Sj, some Sj. S S' H Sj exists. Let 
Pi = { bi — > a il , . . . , bp — > a ifJ } , which exists by Rule 3 (4) of the construction ofTi',H, 
and p. By Rule 3(3), P2 = {a^ — > bo, ■ ■ ■ , (H„ — > bo} exists. Let P = P1UP2. Note that 
Pi consists of j3 edges. P2 consists of at most \S'\ edges. Thus P has at most p = (3+h 
edges. For all j with 1 < j < j3, the edges 60 — ► a<o, ao - * bj, bj — > a,. , a,. — > 60 form 
a vertex-simple traversable cycles. These cycles all go through bo — > ao and form the 
only nonsingleton connected component of H U P. This component is clearly strongly 
connected and bipartite- 2-connected. This finishes the proof of Claim |3|. 

Claim 4. If some PC E' — E consists of at most p edges such that every 
nonsingleton connected component of Ti. U P is strongly connected and bipartite-2- 
connected, then some S' C S with \S'\ < h has at least one element in each Sj. 

The proof of this claim is the same as that of Claim and uses only the com 



4.11 



ponentwise strong connectivity of Ti U P. This finishes the proof of Theorem 
□ 

The next two problems are variants of Problems ^ and 0. 
Problem 8 (Protection of All Sets). 

• Input: The suppressed graph of a table T whose total graph is undirected, 
and a positive integer k. 

• Output: A set P consisting of the smallest number of published cells of T 
such that every set of at most k rows or k columns is protected in the table 
formed by T with the cells in P also suppressed. 

Problem 9. 
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• Input: A bipartite undirected graph TL = (A, B, E) and a positive integer k. 

• Output: A set P consisting of the smallest number of undirected edges be- 
tween A and B but not in E such that every nonsingleton connected compo- 
nent of (A, B,EU P) is bipartite-(fc + l)-connected. 

Lemma 4.12. Pro&/ems|^ and|^ can be reduced to each other in linear time. 
Proof. The proof is similar to that of Lemma |4.1C . □ 



THEOREM 4.13 (|]|). For k = 1, Problem^ can be solved in linear time 



can be solved in linear time. 



Theorem 4.14. For k = 1, Problem\l 
Proof. The proof follows from Lemma 4.121 and Theorem |4.13l. □ 



5. Protection of a table. Let R%, . . . , R n be the rows and columns of T. T is 
protected with respect to the positive invariants (respectively, the sum invariants, or 
the rectangular sum invariants) if it holds the conditions below: 

1. Every positive invariant (respectively, nonzero sum invariant, or nonzero rect- 
angular sum invariant) of T is a positive linear combination of Ri, . . . ,R n , 
where a positive linear combination is one that has no negative coefficients 
and at least one positive coefficient. 

2. T has no invariant cell. 

These definitions allow only positive linear combinations, because general linear com- 
binations of R\ , . . . , R n generate all linear invariants and leave nothing for protection. 
This restriction excludes the protection with respect to the general linear invariants. 
As a result, the protection with respect to the unitary invariants are also not con- 



sidered, because by Theorem 24, these invariants have the same structures as the 
general linear invariants do. 

Theorem 5.1. The three definitions of a table being protected are all equivalent. 

Proof. Because a protected table has no invariant cells, each row or column has 
either no suppressed cell or at least two suppressed cells. It suffices to prove that if 
T holds this condition, then the statements below are equivalent: 

1. Every positive invariant is a positive linear combination of Ri, . . . , R n . 

2. Every nonzero sum invariant is a positive linear combination of Ri, . . . , R n - 

3. Every nonzero rectangular sum invariant of T is a positive linear combination 
of Ri, . . . , R n - 

4. The nonzero linear invariants among R\, . . . ,R n are the only sum minimal 
invariants of T. 

The directions 1 =>■ 2 and 2 =4> 3 are straightforward. The direction 4^1 
follows from Theorem 2.1(3). To prove the direction 3 =4> 4, note that for each Rj 



with EA(Rj) ^ 0, Rj is a nonzero sum invariant. By Theorem 2.1 there is a sum 
minimal invariant F with EA(F) C EA(Rj). F is also rectangular. By Statement 
3, F = z2 i=1 Ci-Ri where Ci > 0. By coefficient comparison there is some Ch > 
with EA(Rh) ^ 0. Because a > 0, ^ EA(R h ) C EA(F) C EA(Rj). Then 
Rh = Rj because two distinct Ri cannot share more than one cell and each nonempty 
EA(Ri) contains at least two cells. Thus Rj equals F and is a sum minimal invariant. 
To prove the desired uniqueness of R±, . . . ,i? n , let F' be a sum minimal invariant 



with EA(F') C \Jl =1 EA(R i ). By Lemma [D], F' is rectangular. By Statement 3, 
F' = Yli=i c 'i'Ri where > 0. By coefficient comparison there is some c'j > with 
EA(Rj) ^ 0. Because c\ > 0, EA(Rj) C EA(F'). Then F' = ~R by coefficient 
comparison and the minimality of F'. □ 

5.1. Table protection and bipartite-completeness. A graph Q — (X,Y,I) 
is bipartite- complete if it is complete, \X \ > 2 and \Y\ > 2. 
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Fact 3. Let U\,...,u g be the vertices in Q. Let EA(uj) be the set of edges 
incident to Ui. Then Q is bipartite- complete if and only if it is bridge-free and has 
more than one vertex, and the sets EA{ui) are its only bipartite minimal edge cuts. 

Theorem 5.2. T is protected if and only if each nonsingleton connected compo- 
nent ofTL is strongly connected and bipartite- complete. 

Proof. By Fact ||, it suffices to prove that the following statments are equivalent: 

1. T is protected. 

2. The nonzero invariants among R\, . . . , R n are the only sum minimal invariants 
of T. Also each Ri contains either no suppressed cell or at least two suppressed 
cells. 

3. Each connected component of Ti is strongly connected and bridge-free. Also 
the nonempty sets among EA(R\), . . . , EA{R n ) are the only bipartite mini- 
mal edge cuts of the strong components of Ti. 



The equivalence 1^2 follows from the proof of Theorem 5.1. The equivalence 2 <^4> 3 



follows from Theorems 2.1 and 3.1. □ 

Corollary 5.3. Given Ti, it takes linear time in the size of Ti to determine 
whether T is protected. 



Proof. This is an immediate corollary of Theorem 5.2. □ 



5.2. Optimal suppression problems for table protection. 

Problem 10 (Protection of a Table). 

• Input: T and a nonnegative integer p. 

• Output: Is there a set P consisting of at most p published cells of T such 
that the table formed by T with the cells in P also suppressed is protected? 

Problem [l^ can be reformulated as the following graph augmentation problem. 
Problem 11. 

• Input: A complete bipartite graph Ti', a subgraph Ti, and an integer p > 0. 

• Output: Is there a set P of at most p edges mTi'—Ti such that each nonsingle- 
ton connected component TiUP is strongly connected and bipartite-complete? 



Lemma 5.4. Problems and 11 can be reduced to each other in linear time. 
Proof. The proof follows from Theorem 5.2. □ 
Theorem 5.5. Problems \l(\ and |71| are NP-complete. 

Pro of. Problems |l0| and |l]] arc both in NP. To prove their completeness, by 
Lemma 5T , it suffices to reduce Problem || to Problem [ll]. Given an instance S = 
{si,...,s a }, W = {Sx,...,Sf}}, h of Problem|, let U ' = (AB,E'),TL = (A,B,E),p 
be the instance constructed for Theorem |3.4| with the modification below: 
• Rule 5': Let p = (j3 + i)-h. 

This construction can be computed in polynomial time. The next two claims 
show that it is a desired reduction from Problem || to Problem [Til 

Claim 5. If some S' C S with \S'\ < h has at least one element in each Sj, then 
some P C E' — E consists of at most p edges such that every nonsingleton connected 
component ofTlUP is strongly connected and bipartite- complete. 

To prove this claim, observe that for each Sj, some E S' n Sj exists. Let 
A' = {ai t , . . . , dip}. Let B' — {bi, . . . , bp}. Let P\ be the set of edges in E' from B' 
to A'. Let P 2 be the set of edges in E' from A' to b . Let P = Pi U P 2 . Note that P 
has at most p — (f3 + l)-h edges because A' has at most \S'\ < h vertices. For each 
j with 1 < j < (3, the edge bj — * cij is in Pi by Rule 3(4) of the construction of H', 
H, and p. Also, bo — > ao, ao — * bj, and at j — > bo are in Ji U P. These four edges form 
a vertex-simple traversable cycle. These cycles form the only nonsingleton connected 
component in Ti U P. Because these cycles all go through ao, this component is 
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strongly connected. By the choice of P, this component is bipartite-complete. This 
finishes the proof of Claim ||. 

Claim 6. // some P C E' — E consists of at most p edges such that every nons- 
ingleton connected component of TL U P is strongly connected and bipartite- complete, 
then some S' C S with \S'\ < h has at least one element in each Sj. 

To prove this claim, observe that because every connected component of TL U P is 
strongly connected, for each j with 1 < j ' < (3, the set P contains some edges bj — > 
and a,; — > bji. Then ij ^ and Sj. exists in Sj by Rule 3 of the construction of TL', 
TL, and p. Let S' — {s^, . . . , Si»}- Let D be the connected component of TL U P that 
contains ao- Then D also contains , . . . , a,% & and bo, . . . ,bp. By the completeness of 
D, the set P has at least (/3+l)-|S"| edges. Thus \S'\ < h because \P\ < p = {f3+l)-h. 
This finishes the proof of Claim || and thus that of Theorem |5.5| . □ 

The next two problems are variants of Problems [l^ and |ll| . 

Problem 12 (Protection of a Table). 

• Input: The suppressed graph TL of a table T whose total graph is undirected. 

• Output: A set P consisting of the smallest number of published cells of T such 
that the table formed by T with the cells in P also suppressed is protected. 

Problem 13. 

• Input: A bipartite undirected graph TL = (A,B,E). 

• Output: A set P consisting of the smallest number of undirected edges be- 
tween A and B but not in E such that every nonsingleton connected compo- 
nent of (A, B, E U P) is bipartite-complete. 

Lemma 5.6. Problems [7^ and |7J can be reduced to each other in linear time. 



Proof. The proof is similar to that of Lemma 5.4. □ 

Theorem 5.7 (Jl|). Problem^ can be solved in optimal 0(\TL\+p) time, where 
p is the output size. 

Theorem 5.8. Problem |7J can be solved in optimal 0(\TL\ + p) time, where p is 
the output size. 



Proof. This theorem follows from Lemma 5.6 and Theorem 5.7. □ 



6. Discussions. The relationship between the data security of T and the con- 
nectivity of TL are summarized and compared below. 



Levels of Data Security 


Degrees of Graph Connectivity 


all cells 


strongly connected, bridge-free 


all rows and columns 


strongly connected, bipartite- 2-connected 


all sets of k rows or k columns 


strongly connected, bipartite-(/c + l)-connected 


the whole table 


strongly connected, bipartite-complete 



Lemma 6.1. Let R be a row or column of T . Let k be the smallest number of 
row vertices or column vertices in any nonsingleton connected component ofTL. 

1. If R is protected, then every suppressed cell in R is also protected. 

2. If a set of k rows or k columns of T is protected, then every subset of that 
set is also protected. 

3. IfT is protected, then every set of k— 1 rows or k—1 columns is also protected. 
Note that the converses of the above statements are all false. 

Proo f. St atem ents 1 and 2 are straightforward. Statement 3 follows from Theo- 
rems O and O. □ 
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The number in a table cell is its value. A cell with a box is suppressed. The lower 
and upper bounds of the cells are and 9. The graph is the suppressed graph of 
the table. Vertex R p corresponds to row p, and vertex C q to column q. 



Fig. 1.1. A Table and Its Suppressed Graph. 



